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Abstract 



We construct importance sampling schemes for stochastic differential equations "with 
^\i , small noise and fast oscillating coefficients. Standard Monte Carlo methods perform 

poorly for these problems in the small noise limit. With multiscale processes there 
are additional complications, and indeed the straightforward adaptation of methods for 
standard small noise diffusions -will not produce efficient schemes. Using the subsolution 
approach we construct schemes and identify conditions under -which the schemes -will 
i-G ' be asymptotically optimal. Examples and simulation results are provided. 

t^ ■ 
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T:f ; 1 Introduction 

^'j i In this paper we study efficient importance sampling schemes for simulating rare events 

C^ ' associated -with the d-dimensional stochastic differential equation (SDE) 



l.fx.w.^Vc^-w'^"*" 



dt + V~e(T{X'it),^^\dW{t), 



^ ■ X'{0) = X, (1) 

C^ ■ -where 6 = (5(e) | and 

- — )• oo as e I 0, (2) 



and W{t) is a standard d-dimensional Wiener process. The functions b{x , y) , c{x , y) and 
<t(x, y) are assumed to be smooth in each variable and periodic -with period A in every 
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direction with respect to the second variable. The extension to the first order Langevin 
equation model with non-periodic random environment will also be discussed. 

The need to simulate rare events occurs in many application areas, including telecommu- 
nication, finance, insurance, and computational physics and chemistry. However, virtually 
any simulation problem involving rare events will have a number of mathematical and com- 
putational challenges. As it is well known, standard Monte Carlo sampling techniques 
perform very poorly in that the relative errors under a fixed computational effort grow 
rapidly as the event becomes more and more rare. Estimating rare event probabilities in 
the context of diffusion processes with fast oscillating coefficients presents extra difficulties 
due to the additional small parameter 6 and its interaction with the intensity of the noise 
e. 

A potential application of the methods presented in this paper is to chemical physics 
and biology, such as problems involving the folding and binding kinetics of proteins. These 
models usually involve rugged potential surfaces of a complex hierarchical structure with 
potential minima within potential minima, separated by barriers of varying heights due to 
the presence of multiple energy scales. In some cases, one can approximate the dynamics 
by a diffusion in a rough potential where a smooth potential function is superimposed by 
a rough function (see Figures [1] and [2]) . A representative, but by no means complete, list 
of references is [21 [8l EU [32l [34l HH Si]. It turns out that these models often can be 
approximated by homogenized systems where the effect of the multiscale nature is partially 
captured by the effective diffusivity of the system. The formulas for the effective diffusivity 
in the aforementioned chemistry and biology literature coincide with those produced by the 
approximation via homogenization, which justifies our assumption that e and 5 are related 
according to ([2]). Note that the condition ([2]) corresponds to Regime 1 in [13], where the 
sample path large deviation properties of multiscale diffusions are studied under various 
regimes. 

The aim of this paper is to present a more efficient approach to the sampling problem 
for multiscale diffusions. Using the large deviation and weak convergence results from |13j . 
we show how to construct asymptotically optimal importance sampling schemes with rig- 
orous bounds on performance. The construction is based on subsolutions for an associated 
partial differential equation as in [15] . However, it becomes applicable only after significant 
modifications that take into consideration the multiscale aspect of the model. More pre- 
cisely, changes of measure that are purely based on the homogenized system and directly 
suggested by its associated partial differential equation do not lead to efficient importance 
sampling schemes. Instead, appropriate modifications involving the solution to a so-called 
auxiliary "cell problem" have to be made in order to achieve asymptotic optimality. This is 
consistent with the large deviations results obtained in [13] , where a change of measure (or 
equivalently a control) in partial feedback form has to be used to prove a large deviation 
lower bound. By "partial feedback" we mean that the change of measure is a function of 
the fast variable X^ jb. In the present paper a control in full feedback form, i.e., a function 
of both the slow variable X'^ and the fast variable X^ /5, will be used to construct dynamic 
importance sampling schemes with precise asymptotic performance bounds. 

To the best of our knowledge, this is the first work to address the design of asymptot- 
ically optimal importance sampling schemes for multiscale diffusions. Related importance 
sampling problems for regular small noise diffusions without fast oscillations have been re- 
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cently considered in [l6], where the schemes are based on the solution to the corresponding 
Hamilton- Jacobi-Bellman (HJB) equation, and in |24] . The present work is also related to 
the theory of homogenization of HJB equations [H El [El [23 [311 [33] • 

The paper is organized as follows. In Section [2] we review the concept of importance 
sampling and the role of subsolutions to certain HJB equation for small noise diffusions 
without multiscale features. In Section [31 we introduce assumptions, notation and review 
the large deviations results that we use for ([T]). Furthermore, we explain why the standard 
construction of importance sampling schemes based on the homogenized system fails in 
the multiscale setting. The main theorem and its proof are presented in Section [H where 
the correct change of measure is identified. In Section [5] we apply the general results to 
first order Langevin equations and derive some useful explicit formulas. Extensions to an 
equation with random environment are discussed in Section [6l We report simulation results 
in Section [7] for both the periodic and random cases in one dimension. The computational 
challenges that one faces when simulating trajectories of multiscale diffusions are discussed 
in Appendix A. 

2 Importance Sampling and Subsolutions 

In this section we review some known results on importance sampling for small noise diffu- 
sions without the multiscale feature. In particular, we discuss how subsolutions to a related 
HJB equation can be used to design and analyze importance sampling schemes for such 
systems. The purpose of these discussions is not only to introduce some basic concepts in 
importance sampling and subsolutions, but also to set the stage for discussions on why the 
standard procedure is not directly applicable to multiscale diffusion models. 

2.1 Preliminaries on Importance Sampling 

Let {X'^,e > 0} be a d-dimensional small noise diffusion for which a sample path large 
deviation principle holds, and denote the rate function over the interval [t,T] by StTi4')- 
Consider a bounded continuous function h : M i— )• M and suppose that one is interested in 
estimating 

by Monte Carlo. Define 

G{t,x)= inf [5tr(0) + /i(0(r))], (3) 

0GC([t,T];Md),0(t)=a; 

where C([i, T]; M'^) denotes the space of continuous functions from [t, T] to M'^. Then by the 
contraction principle 

lim-eloge{e) = G{t,x). (4) 

e->0 

Let T^{t, x) be any unbiased estimator of 9(e) that is defined on some probability space with 
probability measure P. In other words, r'^(t,x) is a random variable such that 

Er{t,x) = e{e), 



p. Dupuis, K. Spiliopoulos, H. Wang Importance Sampling for Multiscale Diffusions 



where E is the expectation operator associated with P. In this paper we will consider only 
unbiased estimators. 

In Monte Carlo simulation, one generates a number of independent copies of T^(t, x) and 
the estimate is the sample mean. The specific number of samples required depends on the 
desired accuracy, which is measured by the variance of the sample mean. However, since 
the samples are independent it suffices to consider the variance of a single sample. Because 
of unbiasedness, minimizing the variance is equivalent to minimizing the second moment. 
By Jensen's inequality 

E{T^{t,x)f>{Er{t,x)f = 9{e)\ 

It then follows from (JH) that 

limsup-elogE(r'(t,x))2 < 2G{t,x), 

e->0 

and thus 2G{t, x) is the best possible rate of decay of the second moment. If 

liminf-elogE(r'(t,x))2 > 2G(t,x), 

e-5-O 

then r'^(t,x) achieves this best decay rate, and is said to be asymptotically optimal. 

We note that even though much of this paper focuses on asymptotically optimal schemes, 
asymptotic optimality is not the only practical concern. If optimal or nearly optimal schemes 
are too complicated and difficult to implement then one may prefer to construct non-optimal 
but simpler schemes. This is also possible using the subsolution approach that is discussed 
later in the paper, and Theorem 14.11 identifies a lower bound on the improvement over 
ordinary Monte Carlo that will be obtained. In the end, it is an issue of balance between 
complexity and feasibility. 

2.2 Large Deviations for Small Noise Diffusions 

Consider a small noise d-dimensional diffusion process X'' = {X''{s),t < s <T} satisfying 

dX'{s)=r{X'{s))ds + ^/~e<^{X'{s))dW{s), X\t) = x. (5) 

Throughout this paper we work with the canonical filtered probability space (0,5, P) 
equipped with a filtration {Jt} that satisfies the usual conditions. Thus {5t} is right- 
continuous and 5^0 contains all P-negligible sets. Since the purpose of this section is ex- 
pository, we assume for simplicity that the coefficients r{x) and $(x) are smooth, that the 
diffusion matrix 

q[x) = $(x)<I>(x)^ 

is uniformly nondegenerate, and that all these functions are uniformly bounded. 

We next present a representation theorem proved in [6], which will be used here and also 

later on to analyze importance sampling schemes in the multiscale setting. Let A denote 

the set of all 5t-progressively measurable d-dimensional processes v = {f(s),0 <s<T} 

that satisfy 

rT 

E / \\v{t)fdt <oo. 



Jo 
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Theorem 2.1 Given e > 0, let X^ be the unique strong solution to (^. Then for any 
bounded Borel-measurable function g mapping C{\t,T];M. ) into M, 



-elogE 



exp 



9{X^ 



inf E 



T 



\v{s)fds + g{X^n 



where X''''" is the unique strong solution to the stochastic differential equation 

dX'^''{s) = r{X''\s))ds + ^{X''''{s))[/~edW{s) + v{s)ds\, t<s<T, (6) 

with initial condition X''''"{t) = x. 

It is well known that under these conditions the sample path large deviation principle holds 
for {X*", e > 0} with rate function 



1 



StT{4>) 



T 



<j){s) -r{<j){s)) 



g-l{0(s)) 



+00 



ds if(t>eAC{[t,T];R'^),(t>{t) = x 



otherwise. 



where AC{[t,T];M. ) denotes the collection of M -valued absolutely continuous functions on 
interval [t, T] and 

\\v\\b = Vv'^Bv 

for any f G M'^ and symmetric positive definite matrix B. When B is the identity matrix, 
\\v\\b is just the standard Euclidean norm \\v\\. 

2.3 Importance Sampling in the Absence of Multiscale Features 

We first recall the notion of a subsolution to an HJB equation of the type 

Ut{t,x)+H{x,V^U{t,x)) = 0, U{T,x) = h{x). (7) 

In this paper we consider mostly classical sense subsolutions. In some circumstances other 
types, such as weak sense subsolutions, may be useful [HI [15]. 



Definition 2.2 A function U{t,x) : [0,T] x 
equation ^ if 



H> 



is a classical subsolution to the HJB 



1. U is continuously differentiable, 

2. Ut{t, x) + H{x, VxU{t, x)) > for every (i, x) G (0, T) x R'^, 

3. U{T,x) < h{x) forxeM.'^. 

When using subsolutions for importance sampling it is often necessary to impose stronger 
regularity conditions somewhat beyond those of Definition 12.21 To ease exposition, we will 
assume the following condition throughout the paper. It is by no means most economical. 
In particular, the uniform bound on the first and second derivatives is not necessary, and 
can be replaced by milder conditions with further effort. However, it is convenient for the 
purpose of illustration since it guarantees the feedback control used in importance sampling 
is uniformly bounded and thus circumvents a number of technicalities. 
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Condition 2.1 U has continuous derivatives up to order lint and order 2 in x, and the 
first and second derivatives in x are uniformly hounded. 

Next we review the connection between subsolutions and the performance of related 
importance sampling schemes. Typically one designs a subsolution for a specific starting 
time and initial state {t,x). With an abuse of notation {t,x) will also be used at times to 
denote a generic point in [0, T] x W^ (the intended use will be clear from the context). The 
form of the Hamiltonian is naturally suggested by the calculus of variation problem ([3]) and 
the explicit formula of the rate function StT{4>) in Section [2. 2t 



H(x,p) = inf 



1 2 

{p, r{x) + ^{x)u) + - \\u\\ 



{r{x),p) - -{p,q{x)p) 



(8) 



In fact, under mild conditions G is the unique viscosity solution to ([7]). Let U{t,x) be a 
classical subsolution to (jT]) and u the feedback control defined by the minimizer in ([8|) with 
p replaced by VxU{t,x), i.e., 



u{t,x) = -$(x)^Va;C/(t,3;). 



(9) 



Note that under the given conditions u{t, x) is Lipschitz continuous in x, continuous in 
(i,x), and uniformly bounded. 

Consider the family of probability measures P^ defined by the change of measure 

^=exp|--y^ \\uis,X^{s))fds + —J^ {u{s,X%s)),dW{s)) 



By Girsanov's Theorem 

W{s) = W{s) 



u{p,X%p))dp, t<s<T 



is a Brownian motion on [t, T] under the probability measure P*^, and X^ satisfies X^{t) = x 
and 

dX'{s) = r {X'{s)) ds + <^ {X'{s)) [,fedW{s) + u{s, X'{s))ds] . 

Letting 

r(i,x)=exp|-i/.(X^(T))|^(X^), 

it follows easily that under P^, r^(i,x) is an unbiased estimator for 9{e). The performance 
of this estimator is characterized by the decay rate of its second moment 



Q'(t,x;n) = E" 



exp 



-^J<X-iT))} (f (X.) 



(10) 



Following |15] . a verification argument can be used to analyze Q''{t,x;u) as e — t- 0. 
To this end, we need an alternative expression of Q''{t,x;u) that allows us to invoke the 
representation in Theorem 12. 1[ More precisely, since u{s,x) is bounded and continuous, we 
can define X^'~'^ to be the unique strong solution to the equation 



dX^'-"(s) = r (X^'-"(s)) ds + $ (X^'-"(s)) [./^dW{s) - u(s, X^'-"(s))ds] 
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on [t,T] with initial condition X^' "(t) = x. Then by Lemma 14.31 (stated later on in 
generality sufficient for the multiscale case), 



Q%t,x;u) =Eexp<^ — hiX''-^{T)) + 



\u{s,X''-''{s))\\ ds}. 



Note that since u and h are bounded the exponent in the last display is uniformly bounded. 
Hence by Theorem 12.11 



-e\ogQ''{t,x;u) 



(11) 



inf E 

veA 



1 



T 



\vis)fds + 2h{X''-''^''{T)) 



|^(s,X^'-"'^(s))|rds 



where X^' '"''" is the unique strong solution to the equation 

dX^'-"'^(s) = r {X''-^'"{s)) ds + ^ {X''-'''"{s)) [V^dW{s) - [n(s,X^'-"'^'(s)) - v{s)]ds] 

on [t,T] with initial condition X'^'~'"''"(t) = x. 

Fix an arbitrary v £ A and let X = X'^'~^'^. Since U{t,x) is a classical subsolution and 
u{t,x) is the minimizer in ([8]), it follows that 

Ut{t,x) + {V^U{t,x),r(x) - ^{x)u{t,x)) > - ||'u(t,x)f 

for every (t, x) G (0, T) x M . Hence Ito's formula and ^ give 

dU{s,X{s))>^ u{s,X{s)) ^ ds + (v^U{s,X{s)),^{X{s))v{s)\ds 
+ yfe (V,.i7(s, X{s)), ^X{s))dW{s) 



^t' 



j{X{s))V^^U{s,X{s)) 



ds. 



Integrating the last two terms over [t, T] gives a random variable R{e, v) that converges in L^ 
to zero as e — 7- 0, uniformly inv G A. Observing that the second term on the right-hand-side 
is —{u{s,X{s)),v{s)) and using U(T,x) < h{x), one obtains 



h{X{T))-U{t,x) > 



u{s,Xis)) -{u{s,X{s)),v{s) 



ds + R{e,v). (12) 



Now we use the last display to bound one of the two h{X^' ""'^(T)) terms on the right-hand- 
side of (fTTI) . yielding the lower bound 



v{s) - u{s, X{s)) ds + h{X{T)) + C/(t, x) + R{e, v). 



'^Jt 



Setting v{s) = v{s) - u{s,X{s)), it follows that X = X^'^ with X^^" defined as in ([6]). Since 
V &A,hy Theorem [2T] 



E 



\v{s)rds + h{X{T)) 



>-elogEexp<!--/i(X^(r))|>, 
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and therefore 



Mminf —e log Q''{t,x;u) > liminf inf E 

e— >-0 €— i-O vGA 



1 '■^ 



2./i 



\v{s)f ds + h{X{T)) + R{€,v) 



U{t,x) 



>liminf-elogEexp|--/i(X'(T))l +U{t,x) 

= G{t,x) + tJ{t,x). 

Given that U isa subsolution, it is automatic that U{t, x) < G{t, x). Thus for the scheme 
to be asymptotically optimal we need U{t,x) = G{t,x) at the starting point (i,x). The 
subsolution U{t,x) = corresponds to standard Monte Carlo (i.e., no change of measure), 
and we recover the expected decay rate for that case, which is G{t, x). Note that if one can 
obtain a bound on E [2i?(e, v)] that is uniform in v £ A, then non- asymptotic bounds on 
the variance can also be obtained. 

3 Large Deviation Properties of Multiscale Diffusions 

In this section we introduce assumptions and notation, and briefly review the large devi- 
ations results for multiscale diffusions [l3]. We also revisit the subsolution approach to 
importance sampling as discussed in the last section, and identify where the standard con- 
struction breaks down if the multiscale feature of the problem is not incorporated. Through- 
out this section we assume a periodic environment, that is, the functions b{x,y), c{x,y), 
and cr{x, y) are periodic with period A in every direction with respect to the second variable 
y. The extension to general random environments but with specialized dynamics, namely 
first order Langevin equations, is discussed in Section [H 

3.1 The Large Deviation Principle 

We recall that the SDE of interest is 



dXHs 



l6fx^(.),.^^(^)^.^^— ^^(^)^ 



) + c ^X^s), ^) j dt + V~ea ^X^is), ^) dWis) 



6 \ ' " 6 
X'{t) = X. (13) 

The following condition on (|13p will be used whenever the periodic case is discussed. 

Condition 3.1 1. The functions b{x,y), a {x,y) are continuous and globally bounded, as 
are their partial derivatives up to order 2 in x and order 1 in y. The function c{x,y) 
is bounded and Lipschitz continuous. 

2. The diffusion matrix 0"(x, y)a{x, y) is uniformly nondegenerate. 

The following condition will also be assumed. In the condition, T"^ = [0, A]'^ denotes the 
d— dimensional torus. 
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Condition 3.2 Consider the operator defined for smooth / : T — )■ M by 

C^f{y) = {b{x,y),Vf{y)) + ^tr [a{x,y)a{x,yfVVf{y)] , 

together with periodic boundary conditions in y. For any fixed x, let fi{dy\x) be the unique 
invariant probability measure corresponding to Cx- Then the drift b satisfies the centering 
condition (cf. f^) 

b{x,y)fi{dy\x) = 0. 



Under Condition l3.2l for each £ £ {1, . . . ,d} and x there exists a unique function xti^, y) 
that is twice differentiable and A— periodic in every direction in y, and which solves 

^xXi{x,y) = be{x,y), / xdx,y)lJ-idy\x) = 0. (14) 

For a proof see [5l Theorem 3.3.4]. The equation p4p is known as a cell problem. Let 

X = (Xi,--- ,Xd)- 

As we shall see below, x plays a crucial role in the design of asymptotically efficient impor- 
tance sampling schemes for multiscale diffusions. 

We state here the sample path large deviations principle for the solution of (I13p derived 
in [13]. Large deviations principles for special cases of (|13p can also be found in |2H [3]. 

Theorem 3.1 Assume Conditions \3.1\ and \3.S\ and let {X'',e > 0} be the unique strong 
solution to hlS\). Let 



r{x) = (l + — ]{x,y)c{x,y)n{dy\x), 



'^^"^ = /tX^ + S)^"'^^"^"' 



y)(y{x,yY' [l +-Q-] {x,y)'^fi{dy\x), 



where I denotes the identity matrix. Then {X^,€ > 0} satisfies a large deviations principle 
with rate function 



\f ^{s)-r{ct>{s))\ ds ^/<AG^C([t,^];M'^),</.(t)=x 

+00 otherwise. 

Comparing with the rate function for small noise diffusions in Section 12.21 it is obvious 
why r[x) and q{x) are referred to as the "effective drift" and "effective diffusivity" in the 
literature. 
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3.2 A Naive Use of Subsolutions for Multiscale Diffusions 

In this section we illustrate the failure of the standard construction of importance sampling 
schemes by subsolutions as was outlined in Section I2.3[ Even if one uses a subsolution with 
the maximum possible value at the starting point, the scheme can be far from optimal if 
the multiscale feature is not incorporated. 

The large deviation rate function in Theorem 13.11 is identical to that of a small noise 
diffusion ([5]) with dispersion matrix <I>(x) as long as 



<I>(x)<I>(x) = q{x). 



(15) 



Note that for a given q{x), the choice of $(x) is not unique. However, the distribution 
of the solution to ([5]) remains the same no matter which <I>(x) is used, and so we fix a 
Lipschitz continuous diffusion matrix $(x) for which (jlSp holds. Due to the form of the 
calculus of variation problem in the rate function, the H JB equation related to the multiscale 
diffusion model and the Hamiltonian H are exactly the same as in ([7|) and ([8]) , respectively. 
Therefore, given a subsolution U{t,x), ([9]) suggests the control 

u{t,x) = -^{x)'^V^U{t,x) 

which we now blindly apply to the multiscale diffusion process model. 

Suppose that one mimics the steps used in Section 12.31 for the new process model. To 
simplify notation, as before we temporarily denote X'^'""'" by X. Then in place of ()12p one 
obtains 



h{X{T))-U{t,x) 




u{s,Xis)) \(v,Uis,X{s)),a[X{s),^]v{s) 



ds 



V,Uis,X{s)), 



-b + c 



Xis), 



X{s) 



r{X{s)) ) ds 



+ j^ (V,U{s,X{s)), 
+ R{e,v) 



HXis))-aiXis), 



X{s) 



u{s,X{s)) ) ds 



(16) 



where R{e,v) — )• in L^, uniformly in v £ A. The second integral term in the right hand 
side of ()16|) involves ^b{X (s) , X {s) / 5) . To deal with the fact that e/6 t oo as e J, 0, we recall 
the cell problem ()14p and define the function ^ = (-01, • • • , V'd); where for each i £ {1, . . . ,d} 



ipi{t,x,y) = xe{x,y) 



dU{t,x) 
dxi 



10 
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Applying Ito's formula to -0 and substituting into (J16p . we obtain 
h{X{T))-U{t,x) 



> 



T 



u{s,X{s))\f + (v,U{s,X{s)), (^+ 1^) ^ f^^")' ^) ^(") 



ds 



+ r(v^U{s,X{s)),(l + ^)c(x{s),^ 



dy 



+ 



V,U{s,X{s)), 



dx 



'^W«))-(^ + ^)^(^(« 



r{X{s)) ) ds 
X{s) 



6 



u{s,X{s)) ) ds 



+ Rie,v), 



where again R{e,v) — )■ in L^, uniformly in v £ A. If this inequality is inserted into the 
representation (jlip . then it becomes clear that the desired lower bound will not follow. 
Under mild conditions, homogenization can be applied as in |13^ Theorem 2.7] implying 
that the second integral term vanishes in the limit. However, regarding the third term, one 
will need the inequality 



V,U{s,Xis)), 



'-|)^h''^ -*<*<•'» 



v{s) )ds>0 



to hold at least approximately for e small. While the corresponding bound held trivially 
in the case without multiscale if $ is chosen as a, it will not hold even approximately here 
regardless of the choice of $. Indeed, homogenization theory [131 Theorem 2.7] implies that 



E^ /v.,U{s,X{s)),Ulx{s), 



X{s) 



jd 



<j(X{s),y]^i{dy\X{s))\v{s)) ds ^ 0. 



where a = {I + dx/dy)a. Therefore in order for the desired inequality to hold one should 
choose 



^{x) = j^^l^I+^a{x,y)^i[dy\x). 



This is not possible since it violates (jlSp in general. 

From the preceding discussion it is not difficult to see the fundamental difficulty that the 
averaged or effective diffusivity is too crude an approximation for the corresponding control 
in importance sampling to be efficient. The form of q in Theorem 13.11 in fact suggests the 
correct control, which is 



dx, 



u{s,x,y) = -a {x,y){l+—{x,y)] V^U{s,x), 



T 



(17) 



where y will be replaced by the fast motion X''{s)/6 in implementation. A proof of this 
assertion will be given in the next section. Not surprisingly, this form of control is consistent 
with the control used in the proof of the large deviation lower bound in [13| . 
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4 Statement and Proof of the Main Result 

Before stating and proving the main result, we recapitulate the framework and notation. 
We are interested in importance sampling estimator for a functional of the form 

where X^ satisfies the SDE (J13p . According to Theorem 13.11 the relevant H JB is ^ with 
the Hamiltonian H of the form ([8|). Furthermore, if h is bounded and continuous then 

G{t, x) = inf [Stricp) + K<I){T))] = lim -e log 0(e), 

where the infimum is taken over all (p £ C{[t, T];]R. ) such that cl){t) = x. 

Let U he a subsolution to the HJB equation with the terminal condition h, and define 
the control u by (117p . Letting 



u{s) = u{s,X%s),X%s)/d), 
it follows from Girsanov's Theorem that 

dX'is) = [^b + c] (x'{s), ^^^] ds + a (x'{s), ^^^] [VedWis) + n(s)ds] , 
where W{s) is a standard Brownian motion under the probability measure P*^ defined by 

dP' r 1 r^ n , 1 '■^ 



dP 



expj-^^ \\uis)fds + ^J^ {uis),dWis))Y (18) 



The performance measure is then given by the decay rate of the second moment Q'^{t,x;u) 
as defined in (fTOl). 



Theorem 4.1 Let {X'',e > 0} be the solution to il3\) . Consider a bounded and continuous 
function /i : M*^ i->- R and assume Conditions \2Al \3.1\ and \3.^ Let U{s,x) be a subsolution 
of ^ and define the control n(s, x, y) by |J7| j. Then 

liminf-elnQ'(i,a;;n) > G(t,x) + U(t,x). (19) 

Theorem 14.11 does not cover the important case of estimating probabilities such as 
P[X''(T) G A\X''{t) = x], since in this case the corresponding function h is neither bounded 
nor continuous. Recall that a set A C M is called regular [with respect to Stx and the 
initial condition (t, x)] if the infimum of StT over the closure A is the same as the infimum 
over the interior A°. The following result analogous to Theorem 14.11 holds. Its proof uses 
an argument very similar to [15] and is thus omitted. 



Proposition 4.2 Let {X'^,€ > 0} be the solution to iLS\) . Assume Conditions \2.1{ \3.1\ and 
\3.3[ Consider a regular set A CW^ and let 

hix) = i' ^^^^^ 

I +00 if X ^ A. 

Let U{s,x) be a subsolution of ^ with the terminal condition h and define the control 
u{s,x,y) by p7\ ). Then [W\) holds. 
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The rest of this section is devoted to the proof of Theorem 14.11 We first estabhsh an 
alternative representation for the performance measure Q*^ in terms of bounded functions, 
which will allow us to invoke Theorem 12.11 

Lemma 4.3 Let X^ = {X^-^,t <s<T} solve X^{t) = x and 



dX^s) 



15 



b + c 



X-(s),^^\ds 



+ a[X'is). 



X^{s) 



V^dW{s) - u ( s,X'{s), ^^ 1 ds 



Then 



Q''{t,x;u) = E' 



e.p{-h{X^{T))\(§,{X^ 



Eexpl--h{X'{T)) + - I \\u{s,X'{s),X'{s)/5)fds\ 



Proof. Since u(s, 2, z/6) is uniformly bounded, it follows from Girsanov's theorem that 



dP 



1 



V^ 



T 



1 



exp<; -I {u{s),dW{s)) / \\u{s)\\' ds 



2eJt 



T 



defines a new probability measure Q under which 



1 



W{s) = W{s) + -^ / u{p)dp 

is a Brownian motion. Therefore X'' under Q has the same distribution as X^ under P. 
This implies 

r 9 1 f'^ 

Eexpl — h{X'{T)) + - / \\u{s,X'{s),X'{s)/6)fds 
= E^exp !.--h{X'{T)) + - I \\u{s)fds\ 

= Eexp |-\(X^(r)) + ^ ^^ \Hs)fds -^J'^ {u{s),dW{s))^ 
Using (fT5|) . we can continue the last display as 



T 



E'exp<^ — h{X'{T)) + - \\u{s)fds-^ {u{s),dW{s)) 



V^ 



E^ 



exp^-lMX.(T)) (f(X. 



This completes the proof. 
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Proof of Theorem 14.11 Since h and u are bounded, it follows from Lemma 14.31 and 
Theorem [24] that 



-elogQ^(t, x;n) 



(20) 



inf E 



1 



|ti(s)|| ds 



T 



\u{s,X{s),X{s)/5)fds + 2h{X{T)) 



where X = X'^''" solves X{t) = x and 
dX{s) 



|4x(,,^ 



+ c s,X(s) 



X(s) 



+ a{X(s) 



X{s) 



[^/edW{s) + v{s)ds] 



ds 



with 



c {s, X, y) = c(x, y) - a{x, y)u{s, x, y). 



The asymptotic analysis of variational problems analogous to ()20p has already appeared 
in [13] , where large deviation properties of multiscale diffusions such as Theorem 1341 have 
been established through a weak convergence approach. To be more precise, the condition 
e/5 —7- oo corresponds to what is called Regime 1 in |13j . and occupation measure techniques 
are used to characterize the limit of variational problems. Tightness and characterization in 
terms of "relaxed controls" are proved in Proposition 3.1 and Theorem 2.8 of [13], and then 
the relaxed control formulation is rewritten in terms of an ordinary control in Theorem 5.2. 

The difference between the current variational problem and those considered in [13] is 
that in [13] c was independent of time and the middle term in the right-hand-side of (j20p 
was absent. Nonetheless, these differences are only superficial and the analysis of quantities 
similar to the middle term of (|20p . e.g., 

%{s,X{s),^\ds, 



can be carried out using the same arguments as in [131 Proposition 3.1 and Theorem 2.8]. 
For this reason, we will directly state a bound for (|20|) without giving the details of the 
analysis: 



liminf — elogQ'^ft, x; u) 



> inf 

4>&AC{,[t,T\;m.'i),<j>{t)=x 



T 



<p{s) -r{s,4>{s)) 



q~H^{s)) 



ds 



(21) 



T 



t Jfi 



\u{s,(t>{s),y)f iJi{dy\(t>{s))ds + 2h{(t>{T)) 



where 



r{s,x) = r{x) 



/ M + -^(a;, y) j <^{x, y)u{s, x, y)fi{dy\x) 
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Recalling the definition of n, 

f{s, x) = r{x) + q{x)V xU{s, x) 
and 

/ \\u{s,(l){s),y)f ii{dy\(l){s))ds = / (V^^(s, (^(s)), (7(x)V^^(s, 0(s)))(is. 
t Jjd Jt 



Thus the quantity to be minimized in (I2ip can be rewritten as 
I rT 9 rT 



2./i 



s)-r{Hs)) ' ^^^^^ds- I {cl>{s)-r{cl>{s)),V,U{s,cP{s)))ds (22) 



1 '■^ 



{V^U{s, <P{s)),q{x)V,U{s, <P{s)))ds + 2/i(0(T)). 



2jt 
Given an arbitrary </) G ^C([t, r];M'^) with (/)(t) = x, the subsolution property implies that 

^^U{s,^{s)) = Ut{s,ct){s)) + {V,U{s,ct){s)),^{s)) 

> {V,U{s, 4>{s)), 4>{s) - r(0(s))) + ^{V,U{s, 4>{s)) , qi^{s))V ,U {s , 4>{s))). 

Integrating both sides on [t,T] and using the terminal condition U(T,x) < h{x), it follows 
that (j22p is bounded from below by 

I [ <Pis) - r{cPis)) ' ds + h{<P{T)) + U{t, x). 

Note that the first summand is Stxi'P) ^'^'^ by definition G{t,x) is the infimum of the sum 
of the first two terms over (/> G AC{[t, T];R.'^) with (/>(t) = x, and thus 

liminf-elogQ'(t,x;u) > G{t,x) + U{t,x). 

e— ^-O 

This concludes the proof. ■ 

5 First Order Langevin Equation with Periodic Environment 

In this section, we apply the general results to a special but important class of diffusion 
models, namely, the first order Langevin equation 



dX'{t) = -VV (x'{t), ^j^) dt + yfeV2i5dW{t), X^(0) 



X, 



where V" is some potential function and 2D the diffusion constant. We are particularly 
interested in the case where the potential function V^ is composed of a large-scale smooth 
part and a fast oscillating part of smaller magnitude: 

V'{x,x/5) =eQ{x/5) + V{x). 
15 
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Figure 1: V^{x, x/5) = e(cos(x/(5) + sin(x/(5)) + x^ /2 with e = 0.1, 5 = 0.01. 



Thus the equation of interest is 



dX'{t) 



:VQ 



6 



VV{X%t)) 



dt + ^/eV2DdW{t), X'{0) 



X. 



(23) 



In the notation of previous sections, this corresponds to 

b{^,y) = -^^Qiy), c{x,y) = -VV{x), a{x,y) 



'2D. 



An example of such a potential is given in Figure [TJ As before, we examine in some detail 
the model ([23j) with periodic environment, and it is assumed in this section that Qiy) 
is periodic with period A. This periodicity assumption may seem too artificial in many 
practical applications. However, it motivates by analogy the design of importance sampling 
schemes for first order Langevin equations with general random environment. See Section 
[6] for a discussion on these extensions. 

An important observation for an equation of the form (1230 is that the cell problem ()14p 
depends only on y and not on x. Hence, in order to compute u from (|17p we need only 
solve the cell problem (I14p once. To be more specific, the invariant distribution ^{dy\x) to 
the cell problem is independent of x, is of Gibbs type 



li{dy\x) = fi{dy) 



1 



Q(y) , 
D dy, 



L 



jd 



Q{y) , 
D dy, 



and satisfies Condition 13. 2[ 

Explicit formulas for the large deviation rate functions and related quantities are readily 
available for the one dimensional case d = 1. For multi-dimensional cases, they are also 
available under extra assumptions on the potential function V [13]. Since our numerical 
simulation will be performed on one-dimensional models, we only state the relevant results 
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in Corollary 15. II and refer the readers to [13] for more general formulas. The proof is omitted 
as it is a straightforward calculation from Theorems 13.11 and 14.11 



Corollary 5.1 Consider the one dimensional case d = 1 and let {X'^} be the unique strong 
solution to l^23\) . Under Condition ] 3. 1\ {X""} satisfies a large deviations principle with rate 
function 

5oy(</,) = <'^/ J['^(*)-''('^(^))]''^^ iS4>^AC{[Q,T];^) and<l){Q) = x ^^4) 



where 



and 



+00 otherwise, 



AV(x) 2DA2 

r{x) = — , q= — 

LL LL 



L = e o dy, L = e d dy. 

h Jt 



Given a classical subsolution U , the importance sampling control that appears in Theorem 
\4-l\ takes the form 

u[t,x,y) = — e D Ux{t,x). (25) 

An interesting observation from Corollary 15. II is that the effective diffusivity eq is always 
smaller than the diffusivity €2D of the unhomogenized equation, since by Holder's inequality 

LL> ( [ dy] = \\ 



as long as Q is not a constant. The intuition is that the potential surface has many small 
local minima, which manifest themselves in the homogenized dynamics by a reduction in 
the diffusion coefficient since a particle traveling on a rough potential surface may suffer 
from the "trapping" effect of these local minima. 

6 Extension to Random Environment 

Up until now all the multiscale diffusion models we have considered are of periodic envi- 
ronment, i.e., the drift vector and the dispersion matrix are both periodic with respect to 
the fast variable. This section discusses an extension to a random environment. To illus- 
trate the main idea, we specialize again to diffusions governed by the first order Langevin 
equation of type 



dX''\t) 



■^vQ(^)-vy(x^(t)) 



dt + ^ey/2DdW{t), X'^\{))=x. (26) 



In this section, we find it convenient to keep track of both e and 5 and write X^' (t) for 
the solution to (1260 as opposed to X"" as in Section [5l The following condition is assumed 
throughout this section as a substitute for Condition 13.11 in the periodic case. 
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Figure 2: V\x,x/5) = eQ{x/6) + x^ 12 with e = 0.1 and b = 0.01. 



Condition 6.1 1. The coefficient {Q{y),y € M } is a stationary, ergodic random field 
defined on some probability space {"^,Q,i'). For every a; G ^, Q{y,uj) is C^(M'^) in y 
with bounded and Lipschitz continuous derivatives up to order 2. 

2. The coefficient V{x) is deterministic and V{x) E C'^{W^) with bounded and Lipschitz 
continuous derivatives up to order 2. 



The Wiener process in (j26p is defined on another probabihty space, and we work with the 
product space and product measure and so W is independent of Q. Note that for the sake 
of notational simplicity, we have suppressed the dependence of X""' and Q on a; for w G ^. 
Under Condition 16. 11 for every a; G ^, there exists a unique strong solution to the SDE (p6|) . 
In contrast to the periodic case, when equation (I26p is used to model the dynamics of a 
particle in a rough potential, the roughness is due to the "small" randomness generated by 
the random field eQ{x/5). Figure [2] depicts a realization of such a random field superimposed 
on the smooth potential function V{x) = x^/2. In the figure, Q{x) is a zero mean Gaussian 
random field with Gaussian type correlation, i.e., E'^[Q{x)Q{y)\ = exp[— |x — y\^]. 

Compared with multiscale diffusions with periodic environment, the analysis for general 
random media is relatively new. To the best of our knowledge, the first attempt to generalize 
the results obtained for the locally periodic setting to the locally stationary setting probably 
appeared in the work of [37], which studied random walks on Z with a locally stationary 
environment. Extensions have been considered in [42 ^ |43] to diffusions whose generators are 
self-adjoint and take a certain form. 



6.1 Homogenization in One Dimension 

As in Section [5l we state the homogenization theorem for the one-dimensional case where 
everything can be explicitly quantified. For higher dimensions, analogous results exist but 
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explicit calculation is much more difficult. In the following result we assume that e is fixed 
and let 5 tend to zero in order to clearly identify the effect of homogenization. To ease 
exposition, we temporarily denote X^' by X . 



Theorem 6.1 Consider the one dimension case and assume Condition \6.1[ Then the law 
of X on C([0,T];R) converges weakly to the law of X, in probability with respect to v, 
where X is the solution to the SDE 



dX{t) 



KK 



^ V'{X{t))dt + J^dW{t), X{0)=x, 



and with 



K = E'' 



-Qiy)/D 



KK 



K = W 



oQ(y)/D 



(27) 



The rest of this subsection is devoted to the proof of this theorem. Since it is very 
similar to that of [421 Theorem 3.1], we will only give an outline. Without loss of generality, 
we assume e = 1 in the proof. We start with the following lemma. 

Lemma 6.2 Let /i : M — t- M 6e a bounded and measurable function and let {ip{y),y G M} be 
a stationary random field such that 



E'' 



lV'(?/)|e 



-Qiy)/D 



< oo. 



Then as 5 ^ 



Euj sup 

0<i<T 



^ 



^Y^^ /i(^'^(s))ds -B I h{X\s))ds 







in L^{y), where E^ denotes the expected value with respect to the independent Wiener process 
W but with a; G ^ given, and 



^{y)e 



'Qiy)/D 



Proof. The proof follows from Theorem 6.1 in [l2]. The only observation we need to make 
is that in our case the fast motion is governed by the generator 



d 



d^ 



-Q'{y) +D—^. 
ay dy^ 

Then by the discussion in Section 2.2, in particular Theorem 2.1, of [36], the averaging 
should be taken under the corresponding ergodic stationary probability measure on (^,^) 
say vr which satisfies 

for any stationary random field ip. ■ 



Proof of Theorem 16.11 Let {x(y)i V G M} be the random field defined by 

1 rv 



x{y) 



K Jo 



M^VD^, 



y 



19 



p. Dupuis, K. Spiliopoulos, H. Wang Importance Sampling for Multiscale Diffusions 



It is straightforward to verify that E'' [x(y)] = E'' [x'(y)] = and 

K 

-Q'{y)x'{y) + Dx"{y) = Q'{y). 



(28) 



Define the process 

Y\t) = X\t) + / 
Jo 

Then Ito's formula implies that 



1 + x' 



> fX'is) 



V'iX%s))ds + 6-x 



X\t) 



Y\t) = x + V2D [ 
Jo 



1 + X 



dW{s). 



The latter and (I28p imply that {1^ } is v-a,.s. a martingale with quadratic variation 



I, , , 2D /■* 2Q(X^(s)/S) 

^o\(+\ _ ' e D ds. 



K^Jo 
It follows now from Lemma 16.21 [taking h{x) = 1] that as (5 — )• 



E^ sup 

0<t<T 



{Y^){t)-Bt ^0 inLi(z^), 



where 



2D 1 , 



My)/D 



2D K 2D 



K'^ K KK 

Similarly, by Lemma 16.21 [taking h{x) = V'{x)] again we have that 



E^ sup 

0<t<T 



1 + X 



,(X'{s) 



r'tvi 



V'{X'{s))ds 



1 



KK Jo 



rl/v5 



V'{X\s))ds 



in L^{v) as (5 — )• 0. Finally, note that by Birkhoff's ergodic theorem ^^ — )• z/-almost 
surely, as |y| — )■ oo. Thus as 5 — )■ 



sup 

0<t<T 



s-x 



X'{t) 



with probability one 



z^-almost surely. Then the desired convergence follows immediately since 



Y^(-) ^x- 



2D 
KK 



Wi- 



lli probability with respect to i'. This completes the proof. 
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6.2 Importance Sampling Schemes 

It is easy to see that the homogenization result in the random case is analogous to that 
in the periodic case Corollary 15.11 with A = 1 and L, L replaced by K,K, respectively. By 
analogy, it suggests that in the one dimensional case {X""' ,e,6 > 0} satisfies the large 
deviations principle with rate function given by (|24p . where 

V'ix) 2D 

r{x) — 



KK KK 

and K and K are defined by (|27p . Therefore it is natural to conjecture that given a classical 
subsolution [/, the corresponding control takes a similar form 

/ 1 Q(y) - 

u{t,x,y) = -V2D—e~D-U^{t,x). 

Note that in contrast to the periodic case, the control u is random in that it implicitly 
depends on a; G ^ since the random environment Q{y) depends on it. Even though the 
associated importance sampling schemes are shown to be efficient in our empirical study, 
development of the underlying large deviation theory and a rigorous performance analysis 
remains to be done. 

7 Simulation Results 

In this section we test the performance of various Monte Carlo estimators for multiscale 
diffusions with periodic or random environment. Throughout this section, we assume that 
the diffusion process X^ is one-dimensional and satisfies the first order Langevin equation 



7.1 Simulation Results for Periodic Case 

Suppose that we are interested in the Monte Carlo estimation of 

for a continuous function h and in a periodic environment. We compare three unbiased 
estimators. 

1. The standard Monte Carlo estimator 9o{e), 

2. The importance sampling estimator 6i{e) based on the change of measure suggested 
by dig), i.e., 

ui{t,x,y) = -V2Die^^yy''U^{t,x), 
Ij 

where C/ is a subsolution to the homogenized HJB equation ([7|). Note that in imple- 
mentation, X will be replaced by the current state of X'' and y by X^ jb. 



21 



p. Dupuis, K. Spiliopoulos, H. Wang Importance Sampling for Multiscale Diffusions 



3. The importance sampling estimator ^2(e) based on the change of measure correspond- 
ing to the control 

U2{t,x) = -y/qtJ^{t,x) = -V2D—==Uxit,x). 

VLL 

This is the change of measure based on the control suggested by the homogenized HJB 
equation without taking into consideration of the multiscale nature of the dynamics. 
It is independent of the fast variable and differs from the control ui by a factor 

^/2D/Q[^ + X'{y)]- 

Based on the theory developed previously, the estimator ^i(e) should outperform the 
other two estimators as e — t- 0. 

For numerical experimentation, we consider the potential function that is drawn in 
Figure [H that is, V{x) = x^/2 and Q{y) = cosy + siny. Thus the period is A = 27r. Let 

Hx)=i^'-'^: ^^' 

^ ' \{x + lf x<0. 
It follows from CoroUarv 15.11 that the effective drift r{x) and diffusivity q are 

rix) = —Kx, q = 2Dk, where k = — - = — —. 

LL LL 

The limiting HJB equation ([7]) becomes 

Ut{t,x)-KxU^{t,x)-KD\U:r{t,x)f = 0, U{T,x) = h{x). (29) 

Under mild conditions that are satisfied here, the unique viscosity solution to this HJB 
equation equals 

G{t,x) = mi[StT{(t)) + h{ct>{T))] = lim-elog9{e), 

where StTi') is given by Corollarv 15.11 and the infimum is taken over all (f) € ^C([t,T]; 
such that (f){t) = x. One can solve this variational problem explicitly and obtain 

G[t,x) — 



(l + 2L»)e2«r-2De2'^*' 

Since G is not smooth at x = 0, it is not a classical sense solution. In general one should 
mollify it in order to produce a smooth subsolution, but it is known (see [46] for an analogous 
situation) that the bound on performance is still valid if the subsolution is the minimum 
of two classical sense solutions with a single discontinuous interface. Therefore we can just 
define the subsolution U as the solution G and the corresponding controls are 

ui[t,x,y)-V2V ^e {1 + 2D)e^-T - 2De^-^ ^^'' 

^2(*,-) = ^'^ y^(l + 2ZJ)e2^^-2IJe2^* -^^g"^^^' 
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where sign(x) = 1 if x > and sign(x) = — 1 if x < 0. 

In the numerical simulation, we set D = 1, the initial condition (t, x) = (0, 0.05), and the 
terminal time T = 1. One can calculate L = 9.83999 and k = 0.407728. We used a predictor- 
corrector Euler scheme to simulate the trajectories of (j23p and the associated controlled 
SDE. For reasons that will be discussed in Appendix A, a direct numerical approximation 
scheme was adopted instead of other techniques such as multiscale integrator or projective 
integrator methods (see [45 1 117 ^ 123]). By Theorem 17. II we know that the error in the Euler 
approximation is bounded by a term of order Ae/5^, where A is the time discretization 
step. For each choice of e and 6 we chose A so that the aforementioned error bound is 
of the order 0.001. In other words, we set A = 0.001 • (5^/e. Hence, as e gets smaller the 
discretization step A becomes smaller as well. Even though by extensive experimentation 
we found that in general the choice of A was crucial for obtaining accurate results, for the 
periodic example studied here the requirement can be relaxed and a coarser discretization 
can still lead to accurate and stable results. 

Simulations were done using parallel computing in the C programming language. We 

used Mersenne Twister [35] for the random number generator, with a sample size of A^ = 10 . 

The measure for comparing different schemes is the relative error of the estimators, which 

is defined as 

, . . standard deviation of the estimator 

relative error = — 

expected value of the estimator 

The smaller the relative error the more efficient the estimator. Since in practice both the 

standard deviation and the expected value of an estimator are typically unknown, empirical 

relative error is often used for measurement. In other words, the expected value of the 

estimator will be replaced by the empirical sample mean, and the standard deviation of the 

estimator will be replaced by the empirical sample standard error. In order to distinguish 

among the different Monte Carlo procedures, we denote by Pi{e) the empirical relative error 

of ^i(e) for i = 0,1, 2. We would like to point out that the expected value in the denominator 

[which is always 9{e) due to unbiasedness] is replaced by Oi{e), regardless of i. The reason 

is that 9i{e) is the most accurate estimate of ^(e). 

The numerical results are summarized in Table [TJ As suspected, the estimator 6i{e) 

significantly outperforms both the standard Monte Carlo estimator 9o{€) and the estimator 

02(e) which corresponds to the change of measure purely based on the homogenized HJB 

equation. In particular, the estimator 9i{e) seems to be of bounded relative error, which is 

a stronger notion of efficiency than asymptotic optimality. 



No. 


e 


6 


e/S 


Oo{e) 


Oi{e) 


He) 


Po(e) 


/5i(e) 


hie) 


1 


0.25 


0.1 


2.5 


2.26e-l 


2.25e- 1 


2.26e~l 


3.36e-4 


1.76e-3 


6.34e - 4 


2 


0.125 


0.04 


3.125 


3.66e - 2 


3.65e - 2 


3.66e-2 


8.40e - 4 


1.80e-3 


1.43e-3 


3 


0.063 


0.016 


3.94 


9.34e - 4 


9.33e-4 


9.36e-4 


3.29e-3 


1.85e-3 


4.71e-3 


4 


0.03125 


0.007 


4.46 


6.93e - 7 


6.87e-7 


6.99e-7 


4.47e - 2 


8.00e-4 


3.32e-2 


5 


0.025 


0.004 


6.25 


1.48e-8 


1.61e-8 


1.51e-8 


6.85e-2 


7.55e-4 


3.06e - 2 


6 


0.02 


0.002 


10 


3.08e- 10 


1.99e- 10 


1.51e-10 


4.09e - 1 


3.82e-4 


4.97e - 2 


7 


0.015 


0.0013 


11.54 


7.60e-14 


1.37e-13 


1.07e-13 


2.53e-l 


3.01e-4 


1.86e-l 



Table 1: Comparison table for periodic case 
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7.2 Simulation Results for Random Case 

In this section we test the performance of the proposed estimator in the case of a random 
environment by estimating an exit probabiUty. In particular, we again consider the first 
order Langevin equation (j23p in one dimension and wish to estimate the exit probability 



e{€) = P [X'{t') = x+\X'{0) = x] , 

where the exit time r^ is defined by 

r' =inf {t > : X%t) ^ (a;",x+)} with x G {x~,x+). 

As in the periodic case, we compare the estimator proposed in Section 16.21 [again denoted 
by ^i(e)] with the standard Monte-Carlo estimator Oq{€) and with the estimator ^2(£) that 
corresponds to the change of measure based just on the homogenized HJB equation. 

We consider V{x) = x and Q{y) to be a zero mean Gaussian random field with covariance 
function 

E''[Q{x)Q{y)]=exp[-\x-y\% 

It follows from Theorem 16. II that the effective drift r{x) and diffusivity q are respectively 

1 



-K, q = 2Dk, where k 



KK 



In this case the limiting HJB equation is the time independent version of ()29p defined on 
the interval (a;~,x"'") 

kU'{x) + kD\U\x)\^ = 0, 

with the boundary condition U{x) = if a; = x"*" and U{x) = oo \i x = x~ . We consider 
the case D = 1, initial point {t,x) = (0,0) and x^ = ±0.5. The maximal viscosity solution 
(which is also the maximal classical sense subsolution) to this HJB equation is 

U{x) = yA^~^ ~ 2;], X G (x",x"'~). 

While this example is not over a finite time interval, the proof of Theorem 14.11 can be 
adapted as in [15] to yield the analogous results. The importance sampling estimator Oi[e) 
corresponds to the control 

^ ' KD 
while the estimator ^2(e) corresponds to the constant control 



2D 1 

Following the notation of the periodic case, we summarize the simulation results in Table 
[2j We used the randomization method to simulate the Gaussian random environment. See 
|28j for an exposition on the simulation of Gaussian random fields. 

As in the periodic case, the estimator 6i{e) outperforms both 9Q{e) and 02(e)- Comparing 
Tables [1] and m the reader may wonder why we did not try combinations of (e, 6) with larger 
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No. 


e 


S 


e/S 


Oo{e) 


^i(e) 


02{e) 


Po(e) 


PiW 


P2(e) 


1 


0.25 


0.1 


2.5 


1.38e-l 


1.38e-l 


1.38e-l 


7.88e - 4 


1.59e-4 


8.09e-4 


2 


0.125 


0.04 


3.125 


1.28e-2 


1.31e-2 


1.28e-2 


2.27e-3 


4.91e-3 


2.61e-3 


3 


0.0625 


0.018 


3.472 


6.02e - 4 


6.13e-4 


5.89e-4 


1.12e-2 


5.93e-3 


1.32e-2 


4 


0.05 


0.01 


5 


2.38e-5 


2.30e - 5 


2.22e-5 


6.70e-2 


8.89e-3 


9.97e-2 


5 


0.04 


0.007 


5.72 


5.5e-6 


5.93e - 6 


4.86e - 6 


1.25e-l 


5.53e-2 


1.05e-l 


6 


0.025 


0.004 


6.25 


- 


7.82e-10 


1.26e-09 


- 


3.86e - 2 


5.87e-l 



Table 2: Comparison table for random case with negative drift 

ratio e/6. Extensive empirical studies showed that the direct numerical scheme that we 
chose to simulate from the SDE was much more sensitive in honoring the rule for choosing 
the discretization step A = 0.001 • (5^/e in the random case than it was for the periodic 
case. So, due to practical limitation on the computational budget, we had to limit to the 
values reported in Table [2] in order to obtain meaningful results. Note that this significant 
computational burden required to produce samples is independent of the scheme, and thus 
provides further impetus for the development of the theoretically best algorithms. 
We also ran simulations for the model 



dX'{t) 



7«' 



X^it) 



dt + ^/^V2DdW{t), X'{0) 



X. 



(30) 



where Q is the same random field as before and the goal is again to estimate the exit 
probability 



X 



\X%0) 



e{e) = P[X^{T^) 

with now x~ = 0, x+ = 0.8 and x = 0.1. 

Notice that this corresponds to the (random) potential function that is drawn in Figure 
[2j The difference between this and the previous model is that here there exists a stable 
point, in particular at x = 0, in the domain of attraction (see Figure [2]). The results were 
qualitatively the same as in Table [2] and are presented in Table [3l 



No. 


e 


6 


e/S 


^o(e) 


Oi{e) 


02{e) 


Po(e) 


Pi(e) 


/52(e) 


1 


0.25 


0.1 


2.5 


1.56e-l 


1.56e-l 


1.56e-l 


7.37e - 4 


4.93e - 4 


5.77e-4 


2 


0.125 


0.04 


3.125 


2.35e-2 


2.39e - 2 


2.35e-2 


2.00e - 3 


6.53e-3 


1.43e-3 


3 


0.0625 


0.018 


3.472 


2.25e-3 


2.32e-3 


2.25e-3 


6.45e - 3 


3.66e - 2 


3.14e-2 


4 


0.03125 


0.008 


3.91 


5.03e-5 


2.78e-5 


4.36e-5 


8.08e - 2 


8.91e-2 


4.54e - 1 


5 


0.025 


0.006 


4.17 


1.38e-5 


5.23e-6 


8.91e-6 


2.25e-l 


7.79e - 2 


1.89e-l 


6 


0.02 


0.0045 


4.44 


2.0e - 7 


3.07e - 7 


3.11e-7 


4.61e-l 


1.16e-2 


2.01e-l 



Table 3: Comparison table for random case with a rest point 



Appendix A. Numerical Schemes for Multiscale Problems 

Assume for simplicity the periodic setup, and denote by Y^ the numerical approximation 
to XI provided by a direct scheme with weak order of convergence p. For such a numerical 
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scheme one has the fohowing error bound, whose proof fohows from standard arguments, 
e.g. |17j . and will not be repeated here. 



Theorem 7.1 If e,6 and the discretization step A are such that Ae/5'^ <^ 1, then for every 
T > and every smooth function f with compact support, there exist constants Cq < oo and 
/iQ > that are independent of e,6 and A such that for all A,e,6 satisfying Ae/S"^ < ho, 

sup \E^J{Xl) - E, J(y„^)| < Co {A{e/6^)f . (31) 

n<T/A 

The bound (j31|) illustrates the computational difficulty in approximating multiscale 
diffusions. Fix an error tolerance level (. The error bound ()3ip indicates that a time step 
of order 



A = o (^e^A 



is needed. The corresponding computational cost per unit of time is 1/A, which becomes 
more expensive as 5, e and 6/e become smaller. 

Of course, the increasing computational cost highlights the importance of important 
sampling or other fast simulation techniques for treating these kinds of problems. Since our 
interest in this work is to study importance sampling and not the numerical methods, we do 
not elaborate here much on the approximation aspect of the problem. Numerical methods 
such as multiscale integrator methods [ISITT], and projective integrator methods [231 138] 
have been proposed to efficiently simulate systems with widely separated time scales. These 
methods turn out to be less costly than direct approximation methods, especially when 6 is 
small. For example, in the case of equation ([T]) and for 6 = e^''^, perturbation analysis [17] 
shows that these methods are less costly than direct approximation when e <^ (^. Observe 
that in the examples of Section [71 the values of e and 5 were not smaller than the overall 
tolerance error (^. Hence, we chose to use direct approximation schemes and to rely on 
parallel computing to carry out the simulation. 
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